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A VECTOR AND METHOD FOR PREPARATION OF DNA LIBRARIES 

Field of the Invention 

The invention relates to the field of recombinant DNA. In particular, it 
relates to the field of DNA libraries. 

Background to the invention 

The generation of high quality cDNA libraries is essential for gene 
identification strategies based on high throughput sequencing, on phenotypic 
expression in bacteria, yeast, or mammalian cells, on identification of interaction 
partners through the yeast two-hybrid system; or on recovery of cDNAs cognate to 
rare mRNAs. All of the varieties of expression cloning, for example, depend on the 
creation of, or access to, high quality cDNA libraries. Some of the key features of a 
high quality library include, a large number of independent clones (preferably 
greater than 10 7 ), a high percentage of inserts, oriented insertion of the genes, a 
high percentage of full length gene sequences and a population of clones which is 
representative of the starting population ofRNAs. Many of the vectors and 
systems currently used for preparation of cDNA libraries have features which 
inherently limit the capacity to produce high quality libraries. To achieve the 
desired goal of orientation in a cDNA library, it is necessary to prepare cDNA with 
non-complementary ends. A typical strategy is to reverse transcribe the target RNA 
using an oligo-dT primer which comprises at its 5* end a restriction endonuclease 
recognition site. After preparing the complementary strand, adaptors are ligated 
onto the double stranded cDNA. These adaptors generate ends which are not 
complementary to the ends resulting from restriction of the site incorporated in the 
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5* end of the primer. The cDNA is then cut with this enzyme, purified to remove 
fragments and ligated into a suitably cleaved vector. To minimize loss of gene 
sequences, the enzymes used in this strategy (most commonly Not I) restrict DNA 
with low frequency. However, this enzyme is known to restrict DNA within coding 
5 elements and several important genes have been identified (from non-oriented 

libraries) that contain Not I restriction sites. 

Another difficulty in the preparation of high quality libraries is background. 
For a variety of reasons, typical libraries have backgrounds (vector without insert) 
of 2 - 10%. Even though these backgrounds are sufficiently low for purposes such 

10 as expression cloning, they are unacceptable in high cost operations such as 

automated sequencing. Several strategies have been developed which reduce the 
incidence of background in cDNA libraries. The most common technique is to 
dephosphorylate the vector. A dephosphorylated vector cannot ligate to itself. By 
ligating a phosphorylated insert to a dephosphorylated vector, significant reductions 

15 in background can be obtained. However, vector dephosphorylation reduces 

ligation efficiency and thus library size. Furthermore, if the restriction enzyme used 
to cut the vector does not cut to completion, high backgrounds will result. In a 
variation on this strategy, the vector is cut with one restriction enzyme, 
dephosphorylated, then cut with two additional restriction enzymes to produce a 

20 phosphorylated vector with non-complementary ends which can then be used in 

high-efficiency ligations with phosphorylated inserts. However, this technique leads 
to ends which can still ligate to one another to produce vector dimers or insert 
dimers. The most efficient incorporation of cDNA is by methods that produce non 
self-complementary ends on both cDNA and vector. However, methods described 

25 to date do not allow directional insertion of cDNA. In yet another method to reduce 

background, toxic elements have been incorporated into vectors. These vectors are 
designed so that the cDNA insertion site resides within the promoter or coding 
elements of inducible genes which express products toxic to the host cell in the 
presence of an inducing agent. If cDNA is inserted in the vector, then the toxic 

30 gene will be interrupted and the host containing this plasmid will survive in the 
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presence of inducer whereas host cells containing plasmid vector without insert will 
die. However, the toxicity may be incomplete, resulting in a number of slow 
growing clones without insert. In addition, traces of nuclease activity can 
contribute substantially to background by changing the reading frame of restricted 
plasmid prior to ligation. Thus there is a need in the art for better methods of 
preparing oriented, representational cDNA libraries with very low backgrounds. 

Detailed Description of the invention 

One aspect of this invention is a nucleotide polymer comprising two 
elements, one that binds to the nucleotide elements of another nucleic acid polymer 
and an element that comprises the recognition and cleavage site for an 
endonuclease. The binding elements include by way of example, but without 
limitation, random hexamers, random nanomers, homopolymers (deoxythymidine 
homopolymers in particular), sequence-specific nucleotides which bind to specific 
nucleic acid polymers, sequence-specific nucleotides which bind to known types or 
classes of nucleic acid polymers, and combinations of the above. The polymer of the 
invention is prepared by standard chemical or biological methods (Itakura, et al. 53 
Ann. Rev. Biochem. 323-356 (1984); Current Protocols in Molecular Biology, Vol 
1, Ausubei, et al., Eds, John Wiley & Sons, New York (1997 )) known to those of 
skill in the art. The length of this nucleotide polymer element is variable and is 
dependent on conditions affecting the specificity of the binding and is typically 
between 4 and 200 nucleotides in length, with a preferred length between 6 and 30 
nucleotides. 

The element of the nucleotide polymer of the invention that comprises the 
recognition and cleavage site for an endonuclease is comprised of a nucleotide 
sequence that forms one complementary strand of a double stranded DNA sequence 
that is bound by an endonuclease and is cleaved by it. This nucleotide sequence is 
recognized only by endonucleases that cleave the coding elements of target 
genomes with a frequency less than 100 times per genome or by endonucleases 
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which do not cleave cDNA The target genome is the DNA which comprises the 
source of the nucleic acid polymer to which the nucleotide polymer of the invention 
binds. An example of endonucleases that cleave the coding elements of target 
genomes with a frequency of less than 100 times per genome are the intron 
5 endonucleases. These endonucleases are intron-encoded enzymes that, under 

optimal conditions, recognize and cleave asymmetric DNA sequences of unusual 
length (14-31 bp). Intron endonucleases include by way of example, but without 
limitation, Pl-Sce I (VDE), I-Ceu I, I-Tli I and I-Ppo I. Optimum conditions are 
those which are known to provide maximum specificity and enzyme activity. The 

10 preferred intron endonuclease of the invention is VDE. This enzyme has an 

unusually long asymmetric recognition and cleavage site, with only 1 known 
recognition and cleavage site in S, cerevisiae, and none in E. coli. By way of 
example, but without limitation, the wildtype recognition and cleavage sequence for 
this enzyme is 5-TATGTCGGGTGCGGAGAAAGAGGTAAT 

15 GAAA-3' (Gimble and Wang, 263 J. Mol. Biol. 163-180 (1996)). It is known that 

this sequence is somewhat degenerate, Le. base changes at certain locations 
enhance, decrease or have no impact on binding or cleavage. For example, 
substitutions of T for C at position -1 and G for C at position 6 generate a site 
which is more readily cleaved by VDE than the wild type. Other substitutions can 

20 be introduced at sites with known degeneracy to facilitate cleavage of the sequence, 

increase the GC content of the 3' overhang or introduce other changes as may be 
deemed valuable by those of skill in the art. Furthermore, substitution of Mn for Mg 
decreases the specificity of the enzyme. Consequently, changes to this sequence or 
to enzyme reaction conditions that preserve the functionality of the endonuclease 

25 (binding and cleavage) and that do not increase the frequency of cleavage to greater 

than 100 times per genome are within the scope of this invention. 

Other enzymes whose recognition and cleavage sites are useful elements of 
the invention are those that recognize only modified nucleotides not normally found 
in cDNA or DNA amplified by in vitro technologies. One class of enzymes which 
30 meet this criterion are methylation-specific endonucleases. These enzymes 
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recognize only methylated DNA, and because cDNA or in vitro amplified DNA is 
not normally methylated (unless methyl nucleotides are deliberately introduced), it 
will not cleave these DNAs. The nucleotide sequence of the invention is methylated 
by chemical or enzymatic methods to produce a site which is cleaved by the 
methylation-specific endonuclease to produce a unique end. By way of example, 
but without limitation, a nucleotide sequence comprising overlapping Cla I sites can 
be methylated with Cla I methylase to produce a Dpn I recognition and cleavage 
site. Because this enzyme recognizes only methylated DNA, it will not cleave 
cDNA or in vitro amplified DNA. This example is provided only to illustrate the 
means by which those of skill in the art may identify and modify nucleotide 
sequences which serve as unique binding sites for endonucleases which recognize 
and cleave only modified nucleotides. 

Other elements may be added to the nucleotide polymer of the invention as 
may be considered useful by those of skill in the art. These include by way of 
15 example, but without limitation, recognition and cleavage sites for other restriction 

endonucleases, sequences complementary to nucleotide polymers commonly used 
for sequencing, and recognition sites for DNA binding proteins. It is known in the 
art that the restriction and cleavage site for an endonuclease can be added to an 
existing DNA or RNA by techniques such as ligation of a double stranded 
20 oligonucleotide comprising the sequence for the restriction site, or by priming with 

a single stranded oligonucleotide comprising the sequence for the restriction site 
followed by extension of the first strand and synthesis of second strand, to create a 
duplex DNA which can be recognized and cleaved by the enzyme. The principle 
advantage of this invention is that it allows insertion of DNA into vectors without 
25 risk of cleaving the DNA which is being inserted. It is particularly useful in orienting 

DNA within vectors with little or no risk of cleaving the DNA which is being 
oriented. 

It is an object of this invention to provide a method for inserting one DNA 
sequence within another DNA sequence wherein there is little or no risk of cleaving 
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the inserted DNA during subsequent manipulation of the DNA. In the method of 
the invention, adaptors are ligated to the inserted DNA before ligating the inserted 
DNA to the DNA sequence within which the DNA is to be inserted (vector). The 
vector comprises within the DNA insertion site one or more recognition and 
cleavage sites for an endonuclease which has less than 100 recognition and cleavage 
sites within the genome of the insert (rare endonuclease). The adaptors that are 
ligated to the insert DNA comprise either the full sequence of the recognition and 
cleavage site for the rare endonuclease of the invention, which must then be cleaved 
prior to insertion into the vector, or part of the recognition and cleavage site of the 
rare endonuclease which when ligated with the vector reconstitutes the full 
recognition and cleavage site for the rare endonuclease of the invention. The 
recognition and cleavage site for the rare endonuclease of the invention, if not 
already present, may be added to the vector using the same strategies as described 
for the insert DNA. When the insert of the method is ligated to the vector of the 
method, one or more recognition and cleavage sites for the rare endonuclease of the 
invention are regenerated. 

It is another object of the invention to provide a vector with improved 
selection against clones lacking an insert. This vector has an element located 
between two polylinker sites which comprises a conditionally lethal gene. The 
20 region between the two polylinker sites also constitutes the cDNA insertion site. If 

the lethal gene is not removed during the process of vector preparation, host cells 
which are susceptible to the lethal gene product will be killed when transformed 
with this construct. In a preferred vector, the conditionally lethal gene is the kilA 
gene from the broad host range plasmid RK2. In host strains lacking the repressor 
25 genes korA and korB, the kilA gene is expressed and kills the host (Kornackd et al., 

1993; Larsen and Figurski, 1994; Thomas et al., 1995; Thomson et al., 1993). On 
the other hand, vector DNA containing the kilA gene segment is prepared in good 
yield by standard methods from K coli strains which constitutively express korA 
and korB. When the vector is transformed into a korA, korB E. coli strain, a 
3 0 normal transformation frequency (~1 0 9 colonies per ug DNA using 
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electrocompetent cells) is observed. The transformation frequency of kor " bacteria, 
on the other hand, is between 1 and 10 colonies per ug DNA. This provides 
extremely powerful selection against vector. 

It is yet another object of this invention to provide an improved vector for 
cloning of mammalian genes, expression of proteins in mammalian cells and 
expression cloning. In the vector of the invention, the CMV promoter of commonly 
used expression vectors is replaced with the more powerful and stable broad host 
range promoter from EFla. The polylinker region of the vector is flanked by 
endonuclease sites which when cleaved, yield non-complementary ends. One of 
these ends is complementary to the cleavage product of an endonuclease that does 
not cleave cDNA. In a preferred form of the vector, the ends are complementary to 
the cleavage product of an intron endonuclease. The preferred endonuclease is 
VDE. Other sites downstream of the insertion site are provided to confer 
properties of improved stability and expression in transfected mammalian cells. 
These include the EBNA-1 transcription unit, SV40 and human growth hormone 
polyadenylation signal sequences, human IgGl H/CH2 splice site, the Epstein-Barr 
virus OriP, puromycin acetyl transferase gene and transcription terminators. These 
preferred transcription termination sites are bidirectional, e.g. those from circular 
viral genome, such as those of the papova virus family, or synthetic bidirectional 
polyadenylation and termination signals (Figure 1). 



It is an object of this invention to provide a method for the preparation of 
oriented plasmid cDNA libraries with greater than 10 8 primary transformants per ug 
of poly(A)+ RNA. Such libraries are prepared without the use of bacteriophage or 
bacteriophage vectors. The vector of this method is a plasmid with two or more 
endonuclease recognition and cleavage sites which when cleaved by one or more 
endonucleases, give noncomplementary ends. In a preferred vector, these sites are 
recognized by a single endonuclease with a degenerate cleavage site which can 
provide an overhang of 4 or more bases with two or more deoxyguanidines and/or 
deoxycytidines. The preferred site in this method is recognized and cleaved by Bst 
XI. 



In the method of the invention, first strand synthesis is initiated with a 
nucleotide primer which binds to the target nucleic acid. These include, by way of 
example, but without limitation, random hexamers, random nanomers, 
homopolymers (deoxythymidine homopolymers in particular), sequence-specific 
nucleotides which bind to specific nucleic acid polymers, sequence-specific 
nucleotides which bind to known types or classes of nucleic acid polymers, and 
combinations of the above. Another element of the primer is the recognition and 
cleavage site for an endonuclease which is either not found in cDNA or found in 
very low frequency in target genomes. Examples of endonuclease recognition and 
cleavage sites that are not found in cDNA include by way of example but without 
limitation, the sites for intron endonucleases VDE, I-Ceu I, I-Tli I and I-Ppo I as 
well as for the methylation specific enzyme Dpn I. The preferred site is recognized 
and cleaved by the intron endonuclease VDE (Gimble and Stephens, 1995; Gimble 
and Thorner, 1992; Gimble and Thorner, 1993; Gimble and Wang, 1996). 
Cleavage of this site leaves a 4 base, 3' overhang that is rich in GC content but not 
palindromic (GTGC). This is an important feature in the preparation of large size 
cDNA libraries as it prevents end-to-end ligation of cDNA during ligation with 
vector. The high GC content of the overhang increases the stability of overlapping 
complementary ends. This in turn enhances ligation efficiency, which in turn 
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enhances library size. Substitutions of T for C at position -1 and G for C at position 
6 generate a site which is more readily cleaved by VDE than the wild type. Other 
substitutions can be introduced at sites with known degeneracy to facilitate desired 
changes in the sequence. The primer of the invention is used to initiate reverse 
transcription. Many reverse transcriptases and some thermostable polymerases 
with reverse transcriptase activity have been described as being useful in the 
synthesis of first strand cDNA. First strand cDNA is synthesized by these or related 
enzymes according to standard procedures. This step is followed by second strand 
synthesis and ligation of phosphorylated, non-selfcomplementary adaptors. Second 
strand synthesis is performed using a suitable DNA polymerase such as T4 DNA 
polymerase or K coli DNA polymerase I, and priming strategies such as RNase H 
treatment. Other enzymes, such as thermostable polymerases, and/or other priming 
strategies may be used which are known to those of skill in the art. Phosphorylated 
adaptors are selected, annealed and ligated to the cDNA essentially as described 
previously (Seed and ArufFo, 1987; Aniffo and Seed, 1987a). The key feature of 
this method is that the adaptors are non-selfcomplementary, i.e. annealing of the 
two strands of the adaptor generates an overhang which is not complementary to 
itself. By ligating the adaptors in large molar excess over the cDNA, end-to-end 
ligation of the cDNA is minimized. Although large amounts of end-to-end ligations 
of the adaptors occur, this is more than offset by efficient ligation of adaptors to the 
cDNA. 

The endonuclease site (introduced through the primer) is then cleaved, 
leaving nonidentical, noncomplementary ends. It is preferred that the endonuclease 
not cleave the cDNA or that it cleave with very low frequency. The cDNA is 
fractionated and DNA greater than 0.5 - 1 kb in length is ligated to the vector and 
transformed into a suitable host. The preferred method of fractionation is 
potassium acetate gradients, although size exclusion chromatography, other density 
gradients or other techniques known to those of skill in the art may be used. 



The preferred vector of the method has a cDNA insertion site which 
comprises a toxic staffer gene, preferably the kilA gene of the invention, and two 
flanking restriction sites, cleavage of which leaves non-selfcomplementary 
overhangs that are complementary to those of the adaptor and the cleaved intron 
endonuclease site on the cDNA, The overhang which is complementary to the 
adaptor will be located at the 5' end of the cDNA in the preferred version. This 
strategy has the benefit of providing oriented inserts, and cDNA and vector which 
are completely non-selfcomplementary. The cDNA cannot ligate to itself, nor can 
the vector, in any orientation. As a result the cDNA is assimilated into the vector 
with maximal efficiency, in the correct orientation, with low background. 

Other vector-based strategies for producing low background may be 
coupled with the intron endonuclease/non-self complementary adaptor strategy for 
producing large unbiased libraries. To achieve the large library sizes afforded by 
the method of the invention, such vectors must have phosphorylated, non- 
selfcomplementary ends which are complementary to the ends of the cDNA. A 
preferred enzyme for restriction of the preferred vector is Bst XI. This enzyme has 
a degenerate cleavage site which facilitates the selection of overhangs which are 
complementary to more than one restriction site and which may be manipulated to 
have other useful features such as high GC content. Thus, by engineering the 
correct sequences into different Bst XI restriction sites flanking the insertion she, a 
single restriction digest can generate ends which are complementary to both the 
adaptor and the cleaved VDE site. Other enzymes with degenerate cleavage sites 
known to those of skill in the art may be used to leave overhangs on the vector that 
are capable of annealing and ligating to the non-selfcomplementary adaptors 
described above and to the overhang generated by the intron endonuclease. 
Alternatively, more than one enzyme may be used to generate the appropriate 
vector configuration. It is generally advantageous, but not essential, that the 
overhang created by the adaptor is complementary to the overhang generated by the 
enzyme used to cleave the insertion site of the vector. However, the vector must be 
treated in such a way that non-selfcomplementary overhangs are created on the 
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vector which are complementary to the non-selfcomplementary overhangs on the 
cDNA. This can be achieved by ligating adaptors to appropriate restriction 
cleavage site of the vector or by similar techniques known to those of skill in the 
art. Alternative strategies which require the use of two enzymes or multiple 
manipulations of the vector are not preferred because they increase cost and the 
extra manipulations tend to reduce efficiency of library preparation. 

For purposes of preparing a cDNA library, the plasmid vector containing the 
JalA gene is amplified in a korA, korB strain such as MC1061/p3 and isolated by 
standard methods. The purified plasmid is then digested with restriction enzymes 
that flank the kilA gene fragment. The kilA gene fragment is then separated from 
the vector by methods such as gel electrophoresis, size exclusion chromatography, 
density gradient centrifugation and other techniques known to those of skill in the 
art. The vector is then ready for further manipulations or immediate ligation of 
cDNA. 

It is an object of this invention to provide a method for orienting DNA 
inserts within vectors with little or no risk of cleaving the inserted DNA. In this 
method, the same approach used to prepared oriented cDNA inserts is used 
genetically to orient DNA in plasmids and other DNA vectors used in the cloning 
and manipulation of DNA, with little or no risk of cleaving the inserted DNA. 

Example 1. Preparation of unamplifed cDNA library from human small 
intestine 

A cDNA library with 1 . 1 x 10 8 recombinant clones was prepared from 
human small intestine. The cDNA was prepared from 2.5 ug of poly(A)+ RNA 
using an oligo-dT-VDE primer comprising 18 bases of dT and 66 bases including 
3 1 bases of the original VDE recognition site (underlined). The sequence of the 
primer is : 
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5'-CGACGTTGTAAAACGACGGCCAGTGAATTCTC 

TATGTCGGGTGCGGAGAAAGAGG 

TAATGAAAT AC1'1"1'11U11"1TITITTTTT"3' 

Detailed protocol: The poly(A)RNA was diluted with DEPC-H 2 0 to 22 ul and mix 

5 with 2 ul of the primer (1 ug/ul). The mixture was incubated at 70°C for 10 minutes 

and transferred to ice for 5 minutes. The following components were added to the 
sample for the first strand cDNA synthesis: 8 ul of 5x 1* strand buffer [ 250mM 
Tris-HCl(pH 8.3), 375mM KCl, 15mM MgCl 2 ], 4 ul of 10 mM dNTP, 2 ul of 0. 1 
M DTT, 1 ul (40U) of RNase inhibitor ( Life Technologies, Inc., Gaithersburg, 

10 MD) and 1 ul (36 U) of AMV RT-XL (Life Science Research Products, Orlando, 

FL). The sample was incubated at 42° C for 1 hr and 70°C for 10 minutes. The 
second strand cDNA was prepared by adding the following reagents to the first 
strand cDNA sample: 70 ul of H 2 0, 30 ul of 5x 2 nd strand Buffer [100 mM Tris-HCl 
(pH 6.9), 450 mM KC1, 23 mM MgCl 2 , 0.75 mM P-NAD\ 50 mM (NH^SO, ], 2 

15 ul of 0. 1 M DTT, 3 ul of 10 mM dNTP, 0.5 ul (2U) of RNase H ( Life 

Technologies, Inc., Gaithersburg, MD ), 4 ul (40U) of £. coli DNA polymerase I ( 
Life Technologies, Inc., Gaithersburg, MD ) and 1 ul (10U) of K coli DNA ligase ( 
Life Technologies, Inc., Gaithersburg, MD ). The sample was incubated 2 hours at 
16°C, then added 5 ul (5U) of T4 DNA polymerase (Boehringer Mannheim, 

20 Indianapolis, IN ) and incubated 5 minutes at 16° C. The reaction was terminated 

by adding 10 ul of 0.5 M EDTA. The cDNA was purified by phenol extraction and 
ethanol precipitation. The ligation of the Bst XI adaptor to the cDNA was 
performed by adding the following reagents to 20 ul of the cDNA sample in H 2 0: 
10 ul of phosphorylated Bst XI adaptors (5*-CTGGCTCA-3'; 5'- 

25 TGAGCCAGCCCC-3*) ( 10 ug ), 10 ul of ligation buffer [330mM Tris-HCl, 50 

mM MgC12, 5 mM ATP], 7 ul of 0. 1 M DTT and 5 ul (5U)of T4 DNA ligase ( 
Life Technologies, Inc., Gaithersburg, MD ) and incubating overnight at 16°C. The 
cDNA was purified by phenol extraction and ethanol precipitation, then digested 
with 20 units of VDE (New England Biolabs, Beverly, MA) at 37°C for 6 hours. 
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The cDNA was fractionated on a potassium acetate gradient (5 - 20%) for 3 hours 
at 50,000 rpm in a Beckman L5-50 centrifuge using an SW-50 rotor. The cDNA 
fragments greater than 700 bases were collected, concentrated by ethanol 
precipitation and dissolved in 50 ul of TE (10 mM Tris-HCl, pH 8.0, 1 mM EDTA). 

5 The pEAK8 vector was digested with Bst XI using manufacturer's recommended 

procedures (New England Biolabs, Beverly, MA) and purified on a potassium 
acetate gradient as above to remove the tdlA stuffer. After ethanol precipitation, 
the vector was dissolved in TE at 50 ng/ul. Mix 47 ul of the vector with 47 ul of the 
fractionated cDNA , 258 ul of H 2 0, 94 ul of 5x T4 DNA ligase buffer (Life 

10 Technologies, Inc., Gaithersburg, MD) and 24 ul of ligase (1 U/ul) (Life 

Technologies, Inc., Gaithersburg, MD). Incubate the ligation mix for 2 hours at 
room temperature. The DNA sample was desalted by ethanol precipitation and 
dissolved in 30 ul of H 2 0. The DNA was electroporated into electrocompetent E. 
coli DH10B cells in 10 equal fractions, 0.3 ml per fraction. The transformed cells 

1 5 were incubated in SOC medium for 45 minutes before adding glycerol to 1 5% to 

make the frozen stock (-70°C ) of the cDNA library. To check the library titer, 10 ul 
of the library stock was diluted 200 times and plated on ampicillin LB-agar plates. 
The average size of the cDNA inserts was analyzed by extracting the plasmid DNA 
from 24 randomly selected colonies and digested with the restriction enzymes 

20 flanking the cDNA insert. In summary, the primary cDNA library for human small 

intestine contains 1. lx 10 8 primary transformants with the average size of the cDNA 
inserts at 2.3 kb. The vector (without insert) background in this library is less than 
1%. 

Example 2. Preparation of unamplifed cDNA library from human fetal kidney 

25 A cDNA library with 2.7 x 10* recombinant clones was constructed from 

human fetal kidney RNA. The cDNA was prepared from 5.0 ug of poly(A)+ RNA 
using another oligo-dT-VDE primer comprising 18 bases of dT and 60 bases 
including 28 bases of the modified VDE recognition site (underlined). For example, 
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a variant with modification of two bases (C to T at position - 1 and C to G at 
position 6 of the original version) and deletion of three deoxy adenines at the 3' of 
the original version leads to a site that can also be cleaved by VDE enzyme. The 
sequence of the primer is : 

S'-CGACGTTGTAAAACGACGGCCAGTGAATTCTr 

TATGTgGGGTGCGOAGAAAGAGO 

TAATG TTTTTTTTTTTTTTTTTT-3 * 

The cDNA library for human fetal kidney was prepared with similar protocol as 
above except the following changes: 1). the use of the modified oligo-dT-VDE 
primer ; 2). 2 hours of VDE digestion. This library contains 2.7 x 10* primary 
clones with average size of the cDNA inserts at 1.3 kb and less than 1 % of 
background. 

Example 3. High level protein expression in mammalian cells using pEAKlO 

pEAKlO was prepared from pEAK8 by deleting an inhibitory regulatory 
sequence present in the EFla promoter of pEAK8 and removing the BspLU 111 
site in the EFla promoter. The protein expression levels from pEAKlO are 50% 
higher than those for pEAK8. 

The LacZ gene from K coli was cloned in pEAKlO and in three other different 
commercial vectors and the resulting plasmids were transfected into 293HEK cells 
expressing the EBNA-1 protein and the large T-antigen (293 EBNA-T). The 
amount of recombinant protein expressed in 293 EBNA-T cells transfected with 
pEAKiO was, at least, three fold higher than when the same cells were transfected 
with any of the other plasmids. 

LacZ was cloned by standard methods (Current Protocols in Molecular 
Biology, Vol 1, Ausubel, et al., Eds, John Wiley & Sons, New York (1997)) into 
the Hind EI-Not I sites of pEAKlO or pCDNA3.1/Hygro (+) [(Invitrogen, cat# 
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V870-20) (this vector has the CMV promoter and the SV40 origin of replication)], 
pREP4 [(Invitrogen, cat# V004-50) (this vector has the RS V promoter, the EBNA- 
1 expression cassette and an Epstein-Barr virus origin of replication)] or pCEP4 
[(Invitrogen, cat# V044-50) (this vector has the CMV promoter, the EBNA-1 
expression cassette and an Epstein-Barr virus origin of replication)] to generate 
pEAKlO-Pgal, pCDNA3-p g al, pREP4-Pgal or pCEP4-pgal. 
5 x 10 5 293 EBNA-T cells were plated in 60 mm Petri dishes containing 5 ml 
DMEM medium supplemented with 10% calf serum and incubated at standard 
conditions (37°C and 5% COj) for 24 hours. The medium was then changed and 
the plates were incubated in the same conditions for two additional hours. 3 ug of 
pEAKlO-Pgal (or pCDNA3-Pgal, or pREP4-Pgal , or pCEP4~Pgal, three samples 
of each plasmid) were pipetted into a microtube and the following components were 
added in the following order: up to 225 ul water, 25 ul 2.5 M CaCl 2 , 250 ul [50mM 
HEPES, pH 7.05, 1.26 mM Na^O^ 140 mM NaCl], the mix was vortexed 
briefly, incubated at room temperature for one minute and then added dropwise to 
the cell cultures. After three hours of incubation at standard conditions, the medium 
was changed and the transfected cells were incubated for 1, 2 or 3 days. (A total of 
twelve experiment were done, result from the combination of four different 
plasmids at three expression times). 

To harvest the recombinant protein (P-galactosidase), the cells were washed 
once with PBS and collected in 1 ml PBS, spun at 250 g for 5 minutes and 
resuspended in 100 ul 0.25 M Tris.ClH pH 8. The cells were then lysed by three 
freeze-thaw (liquid nitrogen/37°C water bath) cycles and the insoluble material was 
pelleted at 12000 g for 5 minutes at 4°C. 

P-galactosidase liquid assays were done by standard protocols (Current 
Protocols in Molecular Biology, Vol 1, Ausubel, et al., Eds, John Wiley & Sons, 
New York (1997)), and samples were quantified against a standard curve using 
purified E. coli P-galactosidase (Sigma. St. Louis, MO). 
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The levels of P-galactosidase (expressed as % of P-galactosidase per total amount 
of protein) showed that pEAKlO was superior in protein expression to any of the 
other plasmids at any given time (Table I). 





pEAK10-Pgal 


pCDNA3-pgal 


pREP4-Pgal 


pCEP4-Pgal 


day 1 


0.55 


0.45 


0.21 


0.35 


day 2 


0.87 


0.59 


0.20 


0.40 


day 3 


105 


0.50 


0.31 


0.48 



TABLE I: Comparison of expression levels of P-galactosidase (shown as % of P- 
10 galactosidase relative to total protein) in pEAKlO versus other three commercial 

plasmids at different time points. 
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What is claimed is: 



1 . A method for inserting one DNA sequence within another DNA sequence 
with little or no risk of cleaving the inserted DNA, comprising the steps of: 

a. preparing a double stranded DNA sequence to be inserted (insert 
DNA) by ligation of * double stranded oligonucleotide which either 
comprises a sequence that is recognized and cleaved by an 
endonuclease for which less than 100 recognition and cleavage sites 
exist within the genome of the target or which when ligated with a 
vector reconstitutes a sequence that is recognized and cleaved by an 
endonuclease for which less than 100 recognition and cleavage sites 
exist within the genome of the target; 

b. cleaving the insert DNA with an enzyme which recognizes and 
cleaves the insert DNA at the rare endonuclease recognition and 
cleavage site provided with the primer; 

c. ligating the cleaved insert DNA to the DNA sequence within which 
it is to be inserted (vector DNA), wherein the ends of the vector 
DNA are complementary to the ends of the cleaved insert DNA. 

2. A method for inserting one DNA sequence within another DNA sequence 
with little or no risk of cleaving the inserted DNA, comprising the steps of: 
a. preparing a double stranded DNA sequence to be inserted (insert 

DNA) from RNA or DNA (target) using a nucleotide polymer 
(primer) comprising two elements, an element at one end of the 
primer that is the site for initiation of polymerization of the 
complementary DNA strand and another element at the other end of 
the primer that comprises one strand of a double stranded DNA 
sequence that is recognized and cleaved by an endonuclease for 
which less than 100 recognition and cleavage sites exist within the 
genome of the target; 
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b. preparing a second strand of DNA using the first strand as a 
template; 

c. cleaving the insert DNA with an enzyme which recognizes and 
cleaves the insert DNA at the rare endonuclease recognition and 
cleavage site provided with the primer, 

d. ligating the cleaved insert DNA to the DNA sequence within which 
it is to be inserted (vector DNA) wherein the ends of the vector 
DNA are complementary to the ends of the cleaved insert DNA. 

A method for orienting one DNA sequence within another DNA sequence 
with little or no risk of cleaving the inserted DNA, comprising the steps of: 

a. preparing a double stranded DNA sequence to be inserted (insert 
DNA) from RNA or DNA (target) using a nucleotide polymer 
(primer) comprising two elements, an element at one end of the 
primer that is the site for initiation of polymerization of the 
complementary DNA strand and another element at the other end of 
the primer that comprises one strand of a double stranded DNA 
sequence that is recognized and cleaved by an endonuclease for 
which less than 100 recognition and cleavage sites exist within the 
genome of the target; 

b. preparing a second strand of DNA using the first strand as a 
template; 

c. cleaving the insert DNA with an enzyme which recognizes and 
cleaves the insert DNA at the rare endonuclease recognition and 
cleavage site provided with the primer, wherein the ends of the 
cleaved insert DNA are distinct and not self-complementary; 

d. ligating the cleaved insert DNA to the DNA sequence within which 
it is to be inserted (vector DNA), wherein the ends of the vector 
DNA are distinct and not self-complementary and are 
complementary to the ends of the cleaved insert DNA. 

The method of claim 1 in which one end of the insert DNA is prepared by 
ligating non self-complementary adaptors to the DNA before cleaving with 
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an enzyme which recognizes and cleaves the rare endonuclease recognition 
and cleavage site provided with the primer. 

5. The method of claim 2 in which one end of the insert DNA is prepared by 
ligating non self-complementary adaptors to the DNA before cleaving with 

5 an enzyme which recognizes and cleaves the rare endonuclease recognition 

and cleavage site provided with the primer. 

6. The method of claim 3 in which one end of the insert DNA is prepared by 
ligating non self-complementary adaptors to the DNA before cleaving with 
an enzyme which recognizes and cleaves the rare endonuclease recognition 

10 and cleavage site provided with the primer. 

7. A synthetic nucleotide sequence comprising the recognition and cleavage 
site for an intron endonuclease. 

8. The nucleotide sequence of claim 7 which comprises less than a whole 
genome. 

15 9. A plasmid vector which contains an insertion site for DNA that when 

cleaved gives two distinct and non self-complementary single stranded DNA 
sequences. 

10. The vector of claim 9 which contains a bacterial origin of replication and a 
gene conferring a drug resistance in bacteria. 
20 11. The vector of claim 9 which contains a gene located in the DNA insertion 

site that confers a conditionally lethal phenotype. 

12. The vector of claim 1 1 in which the conditionally lethal gene is KilA. 

13. The vector of claim 10 in which an EFla promoter is upstream the insertion 
site for the cDNA. 

25 14. The vector of claim 13 in which the insertion site for the DNA is followed 

by a human IgGl H/CH2 splice sequence and a human polyadenylation 
signal sequence. 

15. The vector of claim 12 which is able to express puromycin acetyl transferase 
gene.. 

30 16. The vector of claim 12 which contains the EBNA-1 transcription unit, and 

an Epstein-Barr virus origin of replication. 
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17. The vector of claim 15 which contains an SV40 origin of replication. 

18. The vector of claim 8 which contains a mammalian expression unit 

19. The vector of claim 18 that contains an origin of replication for mammalian 
cells. 

5 20. A plasmid vector which contains the EF1 a promoter, the EBNA-1 

transcription unit, and an Epstein-Barr virus origin of replication. 
21. A method for the preparation of oriented cDNA libraries comprising the 
steps of: 

a. preparing DNA from RNA using a nucleotide polymer (primer) 

10 comprising two elements, an element at one end of the primer that is 

the site for initiation of polymerization of the complementary DNA 
strand and another element at the other end of the primer that 
comprises one strand of a double stranded DNA sequence that is 
recognized and cleaved by an endonuclease for which less than 100 

IS recognition and cleavage sites exist within the genome of the target; 

b. preparing a second strand of DNA using the first strand as a 
template; 

c. cleaving the cDNA with an enzyme which recognizes and cleaves the 
cDNA at the rare endonuclease recognition and cleavage site 

20 provided with the primer, wherien the ends of the cDNA are distinct 

and non self-complementary; 

d. ligating the cleaved cDNA is ligated to a plasmid vector, wherein the 
ends of the vector DNA are distinct and not self-complementary and 
are complementary to the ends of the cDNA. 

25 22. The method of claim 21 in which one end of the cDNA is prepared by 

ligating non self-complementary adaptors to the cDNA before cleaving with an 
enzyme which recognizes and cleaves the rare endonuclease recognition and 
cleavage site provided with the primer. 

23. The method of claim 22 in which the adaptors are phosphoiylated. 
30 24. A method for production of proteins comprising the step of: 
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transfecting mammalian cells with a vector which comprises a EFla 
promoter, an Epstein-Barr origin of replication, and an EBNA-1 transcription 
element. 

25. The method of claim 24 wherein the mammalian cells express the T antigen 
5 of SV40 and the vector comprises the SV40 origin of replication. 

26. The vector of claim 20 which comprises the SV40 origin of replication. 
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